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(57) Abstract: A method for generating stimulus response curves (e.g., dose response 
curves) shows how the phenotype of one or more cells change in response to varying lev- 
els of the stimulus. Each "point" on the curve represents quantitative phenotype for celKs) 
at a particular level of stimulus (e.g.. dose of a therapeutic). The quantitative phenotypes 
arc multivariate phenotypic rejMesentations of the cell(s). They include various features 
of the cel](s) obtained by image analysis. Such features often include basic parameters 
obtained &om images (e.g.. cell sh^, nucleus area, Golgi texture) and/or biological 
characterizations derived from the basic parameters (e.g., cell cycle state, mitotic index, 
etc.). The stimulus response curves may be compared to allow classification of stimuli 
and identify subtle differences in related stimuli. To facilitate the comparison, it may be 
desirable to present the response curves in a principal component space. 



wo 02/067182 PCT/US02/05553 



5 



10 



PATENT APPLICATION 

CHARACTERIZING BIOLOGICAL STIMULI BY 
RESPONSE CURVES 

B AfKGROUNT) OF THE INVENTION 

The present invention relates to techniques for determining flie response of 
biological cells to varying levels of a particular stimulus. More specifically, Hie 
invention relates to response curves derived &om. multivariate phenotypic data 
extracted from images of biological cells. 

Purified substances having a desirable combination bio-active properties are 
rare and often difficult to identify. Recent advances in traditional organic chemistry 

15 and the development of r^id combinatorial chemistry techniques have increased the 
number of conq)Ounds that researchers can test for a specific biological activity (e.g., 
binding to a target). Unfortunately, the vast majority of "hits" gmerated by such 
techniques do not possess the right combination of properties to qualify as therapeutic 
compounds. When these substances are subjected to low througlqjut cellular and 

20 animal tests to estabUsh their therapeutic useftJness, they are typically found to fail in 
some regard. Unfortunately, such tests are time consuming and cosfly, thus lim i tin g 
the number of substances that can be tested. In a like regard, the few hits that do 
possess the right combination of properties avoid recognition until after the 
throu^put tests are conducted. With better early evaluation techniques, such 

25 promising candidates could be identified earUer in the development process and put 
on a fast track to the marketplace. 

Various early evaluation techniques are under investigation and some have 
shown promise. In particular ceUular phenotyping technologies employing 
sophisticated image analysis have proven very useful in characterizing ther^eutic 
30 chemicals. Such technologies are generally described in WO/00/70528 pubUshed on 
November 23, 2000. These techniques attempt to classify compounds based on 
phenotypic changes tiiat they induce. From these changes, detailed mechanisms of 
action can be deduced. 

Typically, researchers attempting to classify a new compound based on 
35 mechanism of action wish to know how that compound compares to other known 
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therapeutics. Compounds that exhibit similar biological functioning in some regards 
may exhibit similarity in otibier regards as well. One difficulty in assessing similarity 
is that compounds often have greatly varying potencies, In other words, while two 
different compounds may operate by the same or similar mechanism of action, one 

5 conq)ound may operate at a much lower concentration than the other compound. It is 
difficult to make meaningfiil conqiarison of two such compounds until flie dose scales 
of ttiese compounds have been adjusted. To this end, researchers often use dose 
response curves to compare compounds. These curves show the biological 
effectiveness of particular drugs over multiple concmtrations. The effect of the drug 

10 at each different concentration provides tiie "points" for the dose response curves. 

Typically, such dose response curves are limited to a single particular 
biological parameter (e.g., cell count or expression of a protem). The numeric value 
of such parameter is provided as a ftmction of concentration for each confound of 
intaest. The resulting curves can be con5)ared to identify similar trajectories. Two 
15 compounds having similar trajectories might be ej^ected to operate by the same 
mechanism of action, depending upon which biological parameter is being considered. 
Unfortunately, there are significant limits to tfie value of sudi comparisons. Most 
importantly, many differmt parameters may contribute to a mechanism's signature. 
So a simple dose response curve may M to shed light on a mechanism. 

20 While image analysis techniques for characterizing phenotypes can provide 

many different characteristics of a compound, tiieir foil potential has not yet been 
realized. Particularly, it would be useful if such techniques could be applied to obtain 
meaningfiil dose response information for compounds or other stimuli under 
investigation. 

25 

SUMMARY OF THE INVENTION 

The present invention provides a method, program code, and apparatus for 
genraating stimulus response curves (e.g., dose response curves) showing how the 
phenotype of one or more cells change in response to varying levels of the stimulus. 

30 Each "point" on the curve represents quantitative phenotype for cell(s) at a particular 
level of stimulus (e.g., dose of a ther^eutic). The quantitative phenotypes are 
multivariate phenotypic representations of the cell(s). They include various features 
of tiie cell(s) obtained by image analysis. Such features often include basic 
parameters obtained from images (e.g., cell shape, nucleus area, Golgi texture) and/or 

35 biological characterizations derived from the basic parameters (e.g., cell cycle state, 
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mitotic index, etc.). The stimulus response curves may be compared to aUow 
classification of stimuli and identify subtle differences in related stimuli. To facilitate 
the comparison, it may be desirable to present the response curves in a principal 
con^nent space. 

5 One specific aspect of the invention provides a method for detemmiing the 

response of cells to multiple levels of a stimulus. The method may be characterized 
by the foUowing sequence: (a) obtaining feature values which characterize the 
phenotype of cells ejcposed to a particular level of the stimulus to produce a 
"quantitative phenotype," (b) repeating (a) for each of the multiple levels of stimulus 

10 to thereby produce a separate quantitative phenotype of tiie cells at each level of 
stimulus; and (c) ideaatifying a path flirou^ the separate quantitative phenotypes of 
cells exposed to the stimulus. The stimulus can take many different forms. Exanq)les 
include e:q)Osure to chemical compounds, ejqrosure to biological agents, ejqjosure to 
electromagnetic radiation, exposure to particle radiation, exposure to an electrical or 

15 magnetic field or force, exposure to a mechanical field or force, and combinations of 
these. In some cases, the multiple levels of stimulus are multiple durations of after an 
initial exposure to the stimulus, hi this embodiment, the cells are analyzed at various 
times after exposure. 

In particularly preferred embodiments, at least some feature vahies comprising 
20 the quantitative phenotypes are obtained &om an image of the cells. These feature 
values may characterize cell morphology, statistical features of cells (sometimes 
derived fiom intensity histograms), biological classification of the cells, and the like. 
In one example, a biological classification specifies a cell cycle state. 

Sometimes a gr^hical representation of the path provides the most usefiil 
25 information. In a particularly preferred embodiment, the graphical represoitation is 
provided along one or more principle components obtained via a principle componrat 
analysis. 

Another aspect of the invention pertains to apparatus for analyzing im^es of 
cells exposed to multiples of a stimulus and generating a response path based on those 

30 images. The apparatus includes at least (a) an interfece configured to receive the 
images of the cells tiiat have been exposed to said multiple levels of a stimulus; (b) a 
memory for storimg, at least temporarily, some or all of flie images; and (c) one or 
more processors in communication with the memory and designed or configured to 
generate the response path by a technique of tiie type described herein. Typically, the 

35 ^paratus will also include a display that is capable of graphically depicting the 
response path. 

3 
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As indicated, the invention provides particular value whw. used to determine 
whether a first compound and a second compoimd act on cells by a related mechanism 
of action. Thus, another aspect of the invention may be characterized by ttie following 
sequence: (a) for each of multiple concentrations of the first compound, obtaining a 
5 plurality of feature values characterizing the phenotype of cells exposed to the 
particular concentration of the first compound, to thereby produce a plurality of first 
concentration-specific phenotypes; (b) identifying a first path through the first 
concentration-specific phenotypes of cells exposed to the first confound; (c) for each 
of multiple concentrations of the second compound, obtaining a plurality of feature 

10 values characterizing the phenotype of cells exposed to the particular concentration of 
the second compound, to thereby produce a plurality of second concentration-specific 
phenotypes; (d) identifying a second path through second concentration-specific 
phenotypes of cells exposed to the second compound; and (e) comparing the first and 
second paths, wherein a degree of similarity between the paths corresponds to a 

1 5 degree of similarity in the mechanism of action of the first and second compounds, hi 
some particularly valuable appHcations, at least one of the first and second compounds 
is a known therapeutic or potential therapeutic. 

The concentrations of the compounds should vary over an active range. The 
multiple concentrations of the first compound ^ically vary from lowest to highest by 
20 a factor of at least about two. Preferably, the multiple concentrations of the first 
compound include at least five separate concentrations of the first compound, and 
more preferably at least ei^t separate concentrations of the first conq)ound. 

As mentioned above in the context of the first aspect of the invention, the 
feature values may be provided from a number of different sources. Particularly 

25 valuable phenotypic features are provided by image analysis and associated processes. 
In a particularly preferred embodiment of this aspect of the invention, tiie feature 
values include numeric values characterizing one or more of tiie following cellular 
components: DNA, Golgi, cytoskeletal components such as tubulin and actin, and the 
plasma membrane. In a specific embodiment, the plurahty of feature values include 

30 numeric values characterizing one or more of the following cellular components: 
DNA, Golgi, and tubulin. 

In one ^proach, the comparison simply involves graphically depicting the first 
and second paths together. Preferably, the graphical depiction presents the first and 
second paths in a space defined by principal components. Thus, in some 
35 embodiments, the method also involves using the concentration specific phenotypes in 
a technique that provides a reduced-dimensionality space in which to depict the paths 
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(e.g., principal component analysis, linear and non-linear discriminant analysis, 
multidnnensional scaling, and projection pursuit techniques). 

Anottier aspect of the invention pertains to con^uter program products 
including machine-readable media on which are stored program instructions for 
5 implementing at least some portion of the methods described above. Any of the 
methods of this invention may be represented, in whole or in part, as program 
instructions that can be provided on such computer readable media, fii addition, the 
invention pertains to various combinations of data and data structures generated 
and/or used as described herein. 

[0 These and other features and advantages of the present invention will be 

described in more detail below witib reference to the associated figures. 



BRIEF DESCRTPTIQN OF T HF DRAWINGS 

Figure 1 is a process flow chart depicting the preparation and use of a stimulus 
15 response curve based upon phenotypic data. 

Figure 2 is a simplified block diagram of a computo: system that may be used 
to implement various aspects of this invention such as the various image analysis 
algoritiuns of this invention. 

Figure 3A is a plot of several dose response curves for conq)Ounds having 
20 known effects on targets; the curves are presented in a space defined by three principal 
components defined for multivariate phenotypic information. 

Figure 3B is a plot of the dose response curves of only the famesyltransfeiase 
and geranylgeranyltransferase inhibitors from Figure 3A. 

Figure 3C is a plot of the dose response curves of only the actin and tubulin 
25 inhibitors from Figure 3A. 

Figure 3D is a plot of the dose response curves of only the compounds that 
effect signaling pathways directly from Figure 3A. 

Figure 4 is a PCA plot that highUghts the deviation of the actin inhibitor 
Cytochalasin A from the odier actin inhibitors. 
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Figures 5A and 5B are plots of simple dose response curves of a tubulin 
feature as it varies with concentration of Cytochalasin A and Cytochalasin J, 
respectively. 

Figure 6 is a dendrogram showing the confounds of Figure 3A at the IC50 for 
5 A549 cells. 

Figures 7A and 7B are plots of simple dose response curves, across cell lines, 
using cell count as indicator of the potency of Mastoparan and its synthetic analog 
MAS7, respectively 

Figure 8 is a PCA plot showing the dose response paths for 83 different 
oncology compounds clustered in groups representative of the mechanism of action. 

Figure 9 is a PCA plot showing compounds that have biochemical activity 
against an oncology target described in the exarrq)les. 

Figure 10 is a PCA plot showing a subset of the compounds in Figure 9, which 
' subset was identified in a primary screen. 

Figure 1 1 is a plot showing a region of PCA space that represents the optimal 
profile for inhibition of an oncology target 

Figure .12 is a zoomed-in figure of the compounds &om Figure 11 in PCA 

space. 

Figure 13 is a process flow chart showing how a single multivariate dose 
response experiment was used to identify and narrow down 57 hits to eight highly 
potent and specific compounds. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

The present invention allows for comparison and visuali2dng of response 
curves in multidimensional space. The response curves may span various levels of a 
stimulus, with each point in the curve representing a different level of the stimulus. 
For example, each point might represent a different concentration or dose of chemical 
compound. Alternatively, each point in tiie curve may represent a different time after 
initial exposure to a chemical compound, hnportantiy, each point in the response 
curve contains multivariate information about a cell's or population of cells' response 
to a particular level of the stimulus. Preferably, this multivariate information contains 
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some phenotypic infonnation about the cell. Such phenotypic mfonnation may 
provide morphological details, statistical details, and/or higher level biological 
characterizations of the cell or cell population. In an especially preferred 
embodiment, such features are extracted directly or mdirectly from images of the 
5 cells. Of course, the multivariate infonnation in the data points may include non- 
phenotypic information as well. Such infonnation can derive from any of a number of 
different tests and/or other sources such as public literature and databases. 

One important advantage of the present invention is that it allows related 
stimuli to be compared in a manner that accounts for complicated interactions 

10 between multiple phenotypic variables. Such comparisons help identify trends and 
allow characterization of particular stimuli. The comparisons may be accomplished 
by a computing device and/or human observers. To the extent that human observers 
are involved in the comparison, it will be beneficial to depict the multivariate response 
curves ia a space that snphasizes variations in the data. For example, the invention 

15 may involve depicting the response paths in a space defined by principle conq>onents. 
In this manner, compUcated multivariate data is depicted so that it can be easily 
comprehended. To the extent tiiat a quantitative comparison is required, a conq)uting 
device may compare two or more response curves by any of a number or techniques. 
Such techniques include distance techniques, clustering techniques and the like. 

20 

Process Overview and Relevant Definitions 

Some ofterms used herein are not commonly used in the art. Other terms may 
have multiple meanings in the art Therefore, the following definitions are provided 
as an aid to understanding the description that follows. The mvention as set forth in 
25 the claims should not necessarily be limited by these definitions. 

The term "componenf or "component of a cell" refers to a part of a cell 
having some interesting property that can be employed to derive biologically relevant 
information using image analysis. General examples of cell components iaclude 
biomolecules and cellular organelles. Specific examples of biomolecules that could 

30 serve as cell components for use with this invention include proteins, lipids, 
polysaccharides, proteins, etc. Sometimes, the relevant component will refer to a 
group of structurally or fimctionally related biomolecules. Alternatively, the 
component may represent a portion of a biomolecule such as a polysaccharide group 
on a protein, or a particular sequence of a nucleic acid or protein. Collections of 

35 molecules such as miceUs can also serve as cellular components for use with this 
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invention. And subcellular structures such as vesicles and organelles may also serve 
the purpose. 

The term **marker" or 'labeling agent* * refers to materials that specifically bind 
to and label cell components. These markers or labeling agents should be detectable 
5 in an image of the relevant cells. Typically, a labeling agent emits a signal y/^ose 
intensity is related to the concentration of the cell component to which the agent 
binds. Preferably, the signal intensity is directiy proportional to the concentration of 
the underlying cell component The location of the signal source (i.e., the position of 
the marker) should be detectable in an image of the relevant cells. 

10 Preferably, the chosen maricer binds indiscriminately with its correq>onding 

cellular component, regardless of location within the cell. Although in other 
embodimmts, the chosen marker may bind to specific subsets of tiie con[q)onent of 
interest (e.g., it binds only to sequences of DNA or regions of a chromosome). The 
marker should provide a strong contrast to other features in a given image. To this 

15 end, the marker should be luminescent, radioactive, fluorescent, etc. Various stains 
and conq)ounds may serve this purpose. Examples of such conqwunds include 
fluorescentty labeled antibodies to the cellular componrat of interest, fluorescent 
intercalators, and fluorescent lectins. The antibodies may be fluorescenfly labeled 
either directiy or indirectiy. 

20 The term "stimulus" refers to something that may influence the biological 

condition of a cell. Often the term will be syaonymous with "agenf or 
"manipulation." Stimuli may be materials, radiation (including all maimer of 
electromagnetic and particle radiation), forces (including mechanical (e.g., 
gravitational), electrical, magnetic, and* nuclear), fields, thermal energy, and the like. 
25 General examples of materials that may be used as stimuli include organic and 
inorganic chemical conq)Ounds, biological materials such as nucleic acids, 
carbohydrates, proteins and peptides, lipids, various infectious agents, mixtures of flie 
foregoing, and the like. Other general examples of stimuli include non-ambient 
temperature, non-ambient pressure, acoustic energy, electromagnetic radiation of all 
30 .frequencies, the lack of a particular material (e.g., the lack of oxygen as in ischemia), 
temporal fectors, etc. 

Specific examples of biological stimuli include exposure to hormones, growth 
factors, antibodies, or extracellular matrix components. Or exposure to biologies such 
as infective materials such as viruses that may be naturally occurring viruses or 
viruses engineered to express exogenous genes at various levels. Biological stimuli 
could also include delivery of antisense polynucleotides by means such as gene 
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transfection. Stimuli also could include exposure of cells to conditions that promote 
cell fusion. Specific physical stimuli could include exposing cells to shear stress 
under different rates of fluid flow, ejqwsure of cells to different ten^eratures, 
exposure of cells to vacuum or positive pressure, or exposure of cells to sonication. 
5 Another stimulus includes flying centrifugal force. Still other specific stimuli 
include changes in gravitational force, including sub-gravitation, appUcation of a 
constant or pulsed electrical current Still other stimuli include photobleaching, which 
in some embodiments may include prior addition of a substance that would 
specifically mark areas to be photobleached by subsequent li^t exposure, hi 
10 addition, these types of stimuli may be varied as to time of exposure, or cells could be 
subjected to multiple stunuK in various combinations and orders of addition. Of 
course, the type of manipulation used depends upon the apphcation. 

The terai *'phenotype" generally refers to the total appearance of an organism 
or cell &om an organism. In the context of this invention, ceUular phenotypes and 

15 their representations in processing systems (e.g., conq)uters) are particularly 
interesting. A given cell's phenotype is a function of its genetic constitution and 
environment. Often a particular phenotype can be correlated or associated with a 
particular biological condition or mechanism of action resulting from exposure to a 
stimulus. Generally, cells undergoing a change in biological conditions will undergo a 

20 corresponding change m phenotype. Thus, cellular phenotypic data and 
characterizations may be exploited to deduce mechanisms of action and other aspects 
of cellular responses to various stimuU, 

A selected collection of data and characterizations that represent a phenotype 
of a given cell or group of cells is sometimes referred to as a "quantitative cellular 

25 phenotype." This combination is also sometimes referred to as a phenotypic 
fingerprint or just "fingerprint." The multiple cellular attributes or features of the 
quantitative phenotype can be collectively stored and/or indexed, numerically or 
otherwise. The attributes are typically quantified in the context of specific cellular 
components or markers. Measured attributes useful for characterizing an associated 

30 phenotype include morphological descriptors (e.g., size, shape, and/or location of ttie 
organelle) and composition (e.g., concentration distribution of particular biomolecules 
within the organelle). Other attributes include changes in a migration pattern, a 
growth rate, cord formation, an extracellular matrix deposition, and even cell count. 

The quantitative phenotypes may themselves serve as individual points on 
35 response curves of this invention. A phenotypic response to stimulus may be 
characterized by exposing various cell lines to a stimulus of interest at various levels 
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(e.g., doses of radiation or concraitrations of a compound). In each level within this 
range, the phenotypic descriptors of interest are measured to generate quantitative 
phenotypes associated with levels of stimulus. 

The tern "path" or "response curve" refers to the characterization of a 
5 stimulus at various levels. For example, the path may characterize the effect of a 
chemical s^jplied at various concentrations or the effect of electromagnetic radiation 
provided to cells at various levels of intensity or the effect of depriving a cell of 
various levels of a nutrient. Mathematically, the path is made vp of multiple points, 
each at a diflFerent level of the stimulus. In accordance with this invention, each of 
10 these points is preferably a collection of parameters or characterizations describing 
some aspect of a cell or collection of cells. Typically, at least some of these 
parameters and/or characterizations are derived from images of tiie cells. In this 
regard, they represent quantitative phenotypes of the cells. M the sense that each point 
in the path may contain more than one piece of information about a cell, Ihe points 
1 5 may be viewed as arrays, vectors, matrices, etc. To the extent that the path connects 
- points-containing phenotypic information (separate quantitative phenotypes), the path 
itself may be viewed as a "concentration-independent phenotype." 

As used herein, the term "feature" refers to a phenotypic property of a cell or 
population of cells. Typically, the points in a response curve of this invention are 

20 each comprised of multiple features. The terms "descriptor" and "attribute" may be 
used synonymously with "feature." Features derived from cell images include both 
the basic ''parameters" extracted from a ceU image and the "biological 
characterizations" (including biological classifications such as cell cycle states). The 
latter example of a feature is typically obtained from an algorithm that acts on the 

25 basic parameters. The basic parameters are typically morphological, concentration, 
and/or statistical values obtained by analyzing a cell image showing the positions and 
concentrations of one or more maricers bound within the cells. 

Figure 1 depicts a sample process flow for generating and using response paths 
in accordance with an raabodiment of fliis invention. As dq)icted in Figure 1, a 
30 process 101 begins by identifying a coUection of chemical compounds for use in the 
analysis. See block 103. This operation may be performed by a computing ^paratus 
or possibly by one or more human beings. The compounds selected at 103 will 
ultimately be used to generate data that defines a "phenotypic space" for comparing 
multiple response paths. 

35 After the relevant collection of chemical compounds has been identified at 

103. the process next selects one current compound at 105. Each compound 
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represents a cycle in an iterative process in which multiple compounds are analyzed to 
generate relevant phenotypic data. Each new iteration begins with operation 105. In 
practice, multiple compounds may be analyzed in parallel, so the iterative/sequential 
nature of the process may not be strictly accurate. Regardless of how the process is 
5 depicted, multiple compounds are evaluated at some point The flow chart simply 
depicts this fact. 

Witb a cuirent compound selected, the process next selects a particular 
combmation of compound dose and cell line for application of the dose. See block 
107. In a preferred embodiment, each compound has an associated matrix of cell lines 

10 and doses. This matrix represents the fact that multiple distmct cell lines are treated 
with the compound of interest, each at multiple doses. Each combination of dose and 
cell line provides separate phenotypic information. Ultimately, the response curve 
passes through distmct points, each representing a separate dose. At each dose, the 
phenotypic information spans multiple cell lines. In principle, the points on the 

15 response path can be confined to a single cell line. 

After the cuirent combination of dose and cell line has been selected and 
provided, the process next images the cells of the cuirent cell line that have been 
exposed to the cuirent compound at the current dose. See block 109. If more than one 
cell component is to be considered, the imaging apparatus may generate multiple 

20 images, one for each cell component/marker combination. At 111, the process 
perfonns an image analysis that measures and stores parameter values, hi some 
embodiments, these features will be separately extracted firom multiple images of the 
cell line taken at different times after exposure to the compound. At 1 13, tiie process 
determines whether there are additional combinations of dose and cell lines to be 

25 considered. If so, process control returns to 107 where the next combination of dose 
and cell line is selected. 

Ultimately, all the relevant combinations of dose and cell line for a given 
compound have been imaged and analyzed. At that point, process control proceeds to 
block 115 where the system combines feature values across multiple cell lines to 
30 obtain separate phenotypic vectors for each separate dose. These phenotypic vectors 
represent the individual points in a response path associated with the current 
compound. 

At 117, the process determines whether fliere are more compounds to be 
considered as part of the analysis. If so, process control returns to block 105 where 
35 the next cuirent compound is selected. Thereafter, that compound is treated as 
described above with respect to blocks 107 through 115. 

11 
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Note that some or all of ttie operations described above for each compoimd 
may be automated and performed by a machine. The machine operations may be 
performed by various image acquisition and image analysis apparatus. 

After each of the compounds from the collection identified at 103 have been 

5 analyzed as described above, the process has numerous phenotypic vectors 
(quantitative phenotypes), each of which is associated with a particular combination 
of chemical compound and dose. Each of the vectors represents a point in 
multidimensional space. The numerous dimensions may be difficult to depict in a 
manner that presents meaningful information to a hxmaan viewer. Therefore, process 

10 101 next reduces the dimensionality of feature space and depicts response paths for 
each compound in the reduced dimensional space. See 119. One preferred preach 
to this involves performing a principle component analysis on the collection of 
sq)arate phenotypic vectors. After the reduced dimensional space has be^ generated, 
the system may next compare the separate pa&s of the individual compounds. See 

15 121. This can provide relevant information about the mechanism of action of the 
various compounds. It allows a human or computer algorithm to compare the various 
paflis and draw conclusions about the mechanisms of action of the various 
compounds. Note that it is not strictly necessary to depict the response paths in a 
reduced dimensional space prior to comparing the separate paths. Thus, if a 

20 conqjuting device is used to do the comparison, then operation 1 1 9 may be optional. 

Note that the discussion of process 101 treats exposure to chemical 
compounds as the stimuli of interest. The process 101 can be extended to cover any 
particular stimulus, not just exposure to chemical compounds. As mentioned, stimuli 
of interest to the present invention mclude exposure to biological agents, exposure to 
25 various fields, forces, and radiation, deprivation of agents important for normal cell 
growth and functioning, etc. 

Also, alternative definitions of response path that do not involve variation over 
dose or time could be employed. For exan5)le, a path could be provided through 
multiple distinct cell Unes, where each point on the path represents a different cell 
30 line. 



Selecting Experiments for P rnyiHing; "Response Paths 

Initially, a relevant collection of stunuli for consideration in the analysis must 
be selected. As mentioned, ftie stimuli suitable for use with this invention span a wide 
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range of physical agents, forces, fields, etc. Generally, the collection of stimuli 
chosen for a particular analysis may be selected with no prior assumptions. More 
often, the stimuli are selected because they are believed to have related and interesting 
effects on cells. In the case of potential ther2?)eutic compounds, a number of chemical 
5 compounds may be selected because they are beUeved to have a similar mechanism of 
action when applied to particular cells. For example, compounds may be selected 
because they are believed to possess anti-mitotic properties when appUed to cancer 
cells. 

The data used for the analysis of the invention may be derived from a wide 
10 range of experiments. Such experiments typically span a matrix of experimental 
conditions. Such matrix may include experimental variations in the choice of 
stimulus, the level of each stimulus, the cell lines to which the stimulus is appUed, and 
the particular conq)onents within a cell line that are analyzed. Multiple compounds 
may be ^pUed in multiple concentrations to multiple cell lines. For each 
15 combination of compound, dose, and cell line, multiple images may be obtained. 
Each such image contains information about a separate component/marker 
combination within the cell. Note that the invention is not limited to this wide- 
ranging matrix. At its essence, the invention simply involves considering a single 
stimulus at multiple levels. Of course, each such level should provide multivariate 
20 data about a cell phenotype. However, it is unnecessaiy to employ multiple cell lines 
and/or multiple cellular components in generating the relevant multivariate data. 

The component/marker combinations used in a particular study should be 
chosen based upon the area of interest For example, oncology investigations may 
require a different set of markers than cardiovascular investigations. Furfher, the 

25 choice of markers should vary over a range of cell biology. For example, it typically 
would be unnecessary to choose two separate markers that both image microtubules. 
Depending yxpon the appUcation, the markers can have a very hi^ degree of 
specificity, as m the case of an antibody for tubulin or can be more lower degree of 
specificity, as in the case of lectins. Note that some lectins, such as Lens culinaris 

30 (LC) lectin actually binds to various polysaccharides. Because most of the time these 
polysaccharides components are enriched in the Golgi, IX lectin still can be an 
effective marker for Golgi. 

Generally, cell components tracked in presently preferable embodiments can 
include proteins, protem modifications, genetically manipulated proteins, exogenous 
35 proteins, enzymatic activities, nucleic acids, lipids, carbohydrates, organic and 
inorganic ion concentrations, sub-cellular structures, organelles, plasma membrane. 
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adhesion complex, ion channek, ion pximps, integral membrane proteins, cell surface 
receptors, G-protein coxq)led receptors, tyrosine kinase receptors, nuclear membrane 
receptors, ECM binding complexes, endocytotic machinery, exocytbtic machinery, 
lysosomes, peroxisomes, vacuoles, mitochondria, Golgi apparatus, cytoskeletal 
5 filament network, endoplasmic reticulum, nuclear membrane, proteosome apparatus, 
chromatin, nucleolus, cytoplasm, cytoplasmic signaling apparatus, microbe 
specializations and plant specializations. 

The following table illustrates some cell components and markers (labeling 
agents) that may be used in embodiments of flie present inventiorL Other markers can 
10 be used in various embodiments without departing from the scope of the invention. 



Cell component 


Marker or 


i/iseoSc diaie 




Component 




Plasma membrane 


Carbocyanine dyes 


Apoptosis-Cancer 


(mcluding overall cell 


Phosphatidylserine 


/^X/Up lU did 1^ will «U, 


sh^e) 


Various lipids 


CtegcQcXaUVC l^o 




Glycoproteins 




Adhesion complexes 


Cadherins 


Thrombosis 




Integrins 


Metastasis 




Occludin 


Wound healing 




G^jimction 


Inflammatory Ds 




ERM proteins 


Dermatologic Ds 




CAMS 






Catenins 






Desmosomes 




Ion Channels and Pun^>s 


Na/K A^ase 


Cystic fibrosis 




Calcium channels 


Depression 




Serotonin reuptake pump 


Congestive Heart Failure 




CFTR 


Epilepsy 




SERCA 




G coupled receptors 


P adrenergic receptor 


Hypertension 




Angiotensin receptor 


Heart Failure 






Angina 
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Tyrosine kinase receptors 


PDGF receptor 
FGF receptor 
IGF receptor 


Cancer 

Wound healing 
Angiogenesis 
Cerebrovascular Ds 


ECM binding complexes 


Dystroglycan 
Syndecan 


Muscular Dystrophy 


Endocytotic machinery 


Qathrin 

Adaptor proteins 
COPs 
Presenilins 
Dynamin 


Alzheimer's Ds 


Exocytotic machinery 


SNAREs 
Vesicles 


Epilepsy 
Tetanus 

Systemic Inflammation 
Allergic Reactions 


Lysosomes 


Acid phosphatase 

Transferrin 

Lj^otrackerRed 


Viral diseases 


PeroxisomesA^acuoles 




Neural degenerative Ds 


Mitochondria 


Caspases 

Apoptosis inducing factor 
Fl ATPase 
Fluorescein 
Cyclo-oxygenase 
MitotrackerRed 
Mitotracker Green 


Apoptosis 

Neural degenerative Ds 
Mitochondrial Cytopathies 
Inflammatoiy Ds 
Metabolic Ds 


Golgi Apparatus 


Lens culinaris lectin 
DiOC6 carbocyanine dye 
COPs 

Antibodies specific for Golgi 
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Cytoskeletal Filament 
Networks 


Microtubules 
Actin 

Intermediate Filaments 
Kinesin, dynein, myosin 
Microtubule associated 
proteins 

Actin binding proteins 

Rac/Rho 

Keratins 

GFAP 

Von Wiltbrand's factor 


Cancer 

Neural degenerative Ds 
Inflammatory Ds 
Cardiovascular Ds 
SkinDs 


T7nr1nn1a5?Tnic R^eticuluni 


SNARE 
PDI 

Ribosomes 


Neural degenerative Ds 


Nuclear Membrane 


Lamins 

Nuclear Pore Complex 


Cancer 


Proteosome Annairatus 


Ubiquityl transferases 


Cancer 


Chromatin 


DNA 

Histone proteins 
Histone deacetylases 
Telomerases 


Cancer 
Aging 


Nucleolus 


Phase markers 




Cytoplasm 


Ihtemiediary Metabolic 

Enzymes 

BRCAl 


Cancer 


Cytoplasmic Signaling 
Apparatus 


Calcium 
Camp 
PKC 
PH 


Cardiovascular Ds 
Migraine 
Apoptosis 
Cancer 


Microbe Specializations 


Flagella 
Cilia 

Cell Wall components: 
Chitin synthase 


Infectious Ds 


Plant specializations 


Choloroplast 

Cell Wall components 


Crop Protection 
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In one preferred embodiment, the cellular components considered in separate 
images include one ore more of DNA, cytoskeletal proteins, and Golgi. In a specific 
embodiment, the images for each combination of cell line, dose, and compound 
5 include a DNA image, a tubulin image, and a Golgi image. Various markers can be 
used for each of these components. In a preferred embodiment, the DNA marker is 
DAPI, the tubulin marker is an antibody specific for tubulin, and the Golgi marker is 
LC lectin. 

In one specific approach, the above three markers are analyzed using two 
10 separate processes. In a first process, a cell line is simply stained with a marker for 
DNA. In a second nm, the cell line is stained with all three naarkers. The first process 
run is used to simply identify cell cycle information. For example, this run is used to 
determine the proportion of cells in each separate phase of the cell cycle (Gl, S, 02, 
M, and/or various subphases of M). The two process runs are employed because 
15 imaging tubulin and Golgi require repeating washing of the cells. This process 
selectively causes some cells to wash away, specifically rounded up and mitotic cells. 
Therefore, the remaining cells imaged for tubulin and Golgi are biased toward 
interphase states. 

Regarding the doses or "levels" of the various stimuli, one should endeavor to 
20 choose a range of doses that define and active zone for affecting phenotype in a cell 
line of interest hi one approach, researchers perform a preliminary experiment with 
each drug. The preliminary experiment may mvolve titration across a wide range of 
concentrations. The titration may measure cell count or otiier appropriate biological 
parameter. An tpper boundary of the active zone may be a concentration at which 
25 fiirfher increases of concentration have no additional affect on the cells. For example, 
the upper boundary may be the minimum concentration at which all cells are killed. A 
lower bound of the active zone is the lowest concentration at which some biological 
affect can be observed. 

In some cases, the highest dose allowed by tiie process is govemed by some 
30 physical parameter such as the maximum solubility of a compound. Alternatively, it 
may be govemed by the maximum volume of a compound solution that can be 
administered to a well without having the solvent significantiy affect the cells. 

In a preferred embodiment, a highest level of the stimuli is first identified by 
some technique. Then, additional lower level of the stimulus are identified by 
35 incremental reductions. For example, in the case of a chemical compound, serial 
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dilutions may be performed to generate lower level doses. At a mmimuin, at least two 
levels of the stimulus must be considered. Preferably significantly more levels are 
considered. In a preferred embodiment, at least five separate stimulus levels are 
considered. Jn a specific preferred embodiment, eight separate levels are considered 
5 If a chCTiical compound serves as the stunulus, then the highest concentration of the 
compound should be at least about two times that of the lowest concentration. 

As indicated, phraotypic vectors for given stimuli and level combinations may 
include multivariate information taken from different cell lines. However, this need 
not be the case, as all the multivariate data of interest may be obtained fcom a single 
cell line. Generally, a researcher will chose one or a range of cell lines that are 
relevant to the area of interest For exanq>le, if the researcher focuses on oncology 
appUcations, the cell lines chosen may include different types of cancOT and possibly 
other cells lines that allow one to identify typical side effects of anti cancer drugs. In 
one specific embodiment pertaining to oncology, six different cell lines are 
considered. These include HUVEC (human umbilical vein endothelial cells), A498, 
A548, SF268, SK0V3, andDU145. 



Imagmg 

As indicated, the phenotypic data characteri2mg each point on a response 
curve is derived, at least in part, &om unages of cell Unes e^qjosed to particular 
combinations of stimulus type and stimulus level. See block 109 in Figure 1, for 
example. Various techniques for preparing and imaging ^propriately treated cells are 
described in U.S. Patent AppUcations 09/310,879, 09/311,996, and 09/311,890, 
previously incorporated by reference, hi the case of cells treated with a fluorescent 
marker, a collection of such cells is illuminated with Ught at an excitation fluency. 
A detector is tuned to collect Ught at an emission frequency. The collected Ught is 
used to generate an image, which highUghts regions of high marker concentration. 

Additional operations may be performed prior to, during, or after the imaging 
operation (109) of Figure 1. For example, "quaUty control algorithms" may be 
employed to discard image data based on, for example, poor exposure, focus failures, 
foreign objects, and other imaging failures. Generally, problem images can be 
identified by abnormal intensities and/or spatial statistics. 

In a specific embodiment, a correction algorithm may be appUed prior to 
segmentation to correct for changing Ught conditions, positions of wells, etc. hi one 
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example, a noise reduction technique such as median filtering is employed. Then a 
coirection for spatial differences.in intensity may be employed. In one example, the 
spatial correction comprises a separate model for each image (or gjcovp of images). 
These models may be generated by separately summing or averaging all pixel values 
5 in the x-direction for each value of y and then separately summing or averaging all 
pixel values in the y direction for each value of x. In this manner, a parabolic set of 
correction values is generated for the image or images mider consideration. Applying 
the correction values to the image adjusts for optical system non-linearities, mis- 
positioning of wells during imaging, etc. 

10 The production of the images includes cell plating, compound dilution, 

compound addition and imaging focusing. Failures in any these systems can be 
detected by a variety of methods. For example, cell plating could feil because of a 
clogged tip in a delivCTy pipette. Such failure can be identified by adding a 
fluorescent dye or bead to the cell suspension. The fluorescence of this dye or bead is 

15 chosen to be at a difiGerent channel (wavelength) than the markers used to image 
cellular componrats. Another potential failure could occur during compound 
delivery. To detect such failures, one can add a fluorescent dye or bead in the 
compound plate before compound dilution. The amount of fluorescent dye or bead is 
proportional to the amoimt of compound. Yet anotha: potential problem occurs when 

20 the focus of the image acquisition system changes during im a gin g. To accoimt for 
such spatial biases, one can employ control wells containing, for example, cells with 
no or neutral compounds interspersed throughout the plate. Still another im)blem 
results from foreign objects (e.g., small dust particles) in the well. This can be 
addressed with image segmentation and statistical outlier identification techniques. 

25 Gmerally the images used as the starting point for tiie methods of tiiis 

invention are obtained from cells that have been specially treated and/or imaged under 
conditions that contrast the cell's marked components &om o&er cellular components 
and the background of the image. Typically, the cells are fixed and then treated with a 
material that binds to the components of uiterest and shows up in an image (i.e., the 

30 marker). Preferably, the chosen agent specifically binds to DNA, but not to most 
other cellular biomolecules. 

Multivariate Phenotvpic Data from Images 

At every combination of dose, cell line, and compound, one or more images 
35 are obtained. As mentioned, these images are used to extract various parameter values 
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of relevance to a biological, phenotypic characterization of the compound of interest. 
Graerally a given image of a cell, as represented by one or more markers, can be 
analyzed to obtain any number of image parameters. These parameters are typically 
statistical or morphological m nature. The statistical parameters typically pertain to a 
5 concentration or intensity distribution or histogram. The parameters chosen for use 
with this invention should relate to the expected biological response of the cell Imes to 
the compound of interest. The parameters should also represent a diverse set of 
phenotypic characteristics. 

Some general parameter types suitable for use with this invention include a 
10 cell count, an area, a perimeter, a length, a breadth, a fiber length, a fiber breadth, a 
shape fector, a elliptical form factor, an iimer radius, an outCT radius, a mean radius, 
an equivalent radius, an equivalent sphere volume, an equivalent prolate volume, an 
equivalent oblate volume, an equivalent sphere surface area, an average intensity, a 
total intensity, an optical density, a radial dispersion, and a texture difference. These 
15 parameters can be average or standard deviation values, or firequency statistics from 
the descriptors collected across a population of cells. In some embodiments, the 
parameters include features fsxm different cell portions or cell ^es. 

Examples of some specific parameters/descriptors that may be suitable for use 
in multivariate response paths of this invention are included in flie following table, 
20 Other descriptors can also be used witiiout departing from the scope of the invention. 



Name of Parameter 


Explanation/Comments 


Count 


Number of objects 


Area 




Perimeter 




Length 


Xaxis 


Width 


Yaxis 


Shape Factor 


Measure of roundness of an object 


Height 


Zaxis 


Radius 




Distribution of Brightness 




Radius of Dispersion 


Measure of how dispersed the marker is from its 
centroid 


Centroid location 


x-y position of center of mass 


Nimiber of holes in closed objects 


Derivatives of this measurement might include, for 
example, Euler number (== number of objects - 
number of holes) 


Elliptical Fourier Analysis (EPA) 


Multiple frequencies that describe the shape of a 
closed object 


Wavelet Analysis 


As in EFA, but using wavelet transform 
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liiterobject Orientation 


Polar Coordinate analysis of relative location 


rvlctriHiitinn Infpmhi ect T)iStSTices 


Including statistical characteristics 


Spectral Ou^ut 


Measures the wavelength spectrum of the reporter 
dve Includes FRET 


Optical density 


Absoibance of light 




Phase shifting of lisht 


Reflection interference 


Measure of the distance of the cell membrane from 
the surface of the substrate 


lyjL BlUXX J QlincIlSlUIlai PUUXlCl 

Analysis 


Snfltifll freauencv analvsis of non closed obiects 


1^ and 3 dimensional Wavelet 
Analysis 


Spatial frequency analysis of non closed objects 


Eccentricity 


The eccentricity of the ellipse that has the same 

cftf^miH mmnPTit^ as the iiftffioTi 
A Tneasure of obiect eloncation. 


Long axis/Short Axis Length 


Another measure of object elongation. 


Convex perimeter 


pprimpter nf the smallest convex DolvEon 
QiTTmiiTidinff an obiect 


uonvex area 


Area of the smallest convex nolveon surrounding an 
oHi ect 


Solidity 


Ratio of polygon bounding box area to obiect area. 


i^ent 


nmnArHoTi nf nixels in the bounding box that are also 
in the region 


Granularity 




Pattern matching 


Significance of similarity to reference pattern 


Volume measurements 


As above, but adding a z axis 


Number of Nodes 


The number of nodes protruding from a closed 
obiect such as a cell; characterizes cell shape 


End Points 


Relative positions of nodes from above 



The features xised in the actual points comprising a response path of this 
invention may be parameters directly extracted from the images or they may be 
biological characterizations derived from the parameters. Note that tiie points may 
also include some features that were not directly or indkectly obtained from the 
5 images. For example, the points may include infonnation obtained from public 
sources such as databases, literature, etc. Further, the features conaprising the points 
may include non-image related data, such as data obtained from chCTiical and 
biological assays. 

Often, the parameters are chosen based upon a biological understanding. For 
10 example, if a cell's state in the cell cycle is important to the biological problem being 
investiigated, then parameters that characterize the amount of DNA in a cell and/or the 
degree of condensation of that DNA into chromosomes is relevant. In a specific 
example, cell cycle parameters include the total quantity of DNA in a nucleus, the area 
of the nucleus, and the intensity variance of the cellular DNA. A fiiU discussion of the 
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relevant parameters for characterizing the cell cycle is presented in U.S. Patent 
Application No.09/729,754, previously incorporated by reference. 

Similarly, if an objective is to characterize the Golgi in a cell, ibis can be 
accomplished with parameters that define the location of the Golgi with respect to the 
5 nucleus, describe the texture of the Golgi and describe the local concentration of 
Golgi components. The full discussion of the parameters relevant in characterizing 
Golgi is presented in U.S. Patent AppUcation No. 09/792,012 (Atty. Docket No, 
CYTOP012), previously incorporated by reference. Of specific interest, the Golgi 
complex in a perinuclear region may be characterized usiag parameters such as the 
10 mean, standard deviation, and kurtosis of pixel mtensity, and various eigenvalues 
obtained by singular value decomposition of a pixel intensity matrix for the Golgi 
marker. From these parameters, the Golgi complex of a given cell may be 
characterized as normal, dififiised, dispersed, or dispersed and diffused. 

Further, if the cell shape provides finiher relevant phenotypic data, then 
15 parameters can be chosen accordingly. In one embodiment, a tubulin or other 
cytoskeletal component is marked and imaged to provide parameters relevant to cell 
sh^e. Specific examples of such parameters include the number of nodes on a cell 
image, the distance between end point of those nodes, a coefficient of tubulin 
polymerization (e.g., average pixel intensity of object pixels in a tubulin channel), 
20 averaged across all cells in a population, and a coefficient of microtubule 
reorganization (e.g., standard deviation of wavelet coefficients), averaged across all 
cells in a population. A fiill discussion of parameters relevant to characterizing cell 
shape can be found in U.S. Patent Application No. 09/792,013 (Atty. Docket No. 
CYTOP013), previously incorporated by reference. 

25 While a fimdamental biological understanding can often direct one to the 

^propriate choice of parameters for use in this invention, a systematic analysis of 
data can help identify parameters that might not be immediately apparent. Such 
analysis can be conducted in a manner that finds parameters that are best able to show 
subtle differences in the response path. By considering the effect of vaiymg a single 

30 parameter at a time, one can qmckly hpme in the most relevant parameters for 
developing response curves in accordance with this invention. 

As applied to specific markers, one preferred collection of parameters includes 
(1) total number of cells in a population of interest, (2) number of cells relative to 
number of cells in one or more controls, (3) proportion of cells in each of the stages of 
35 cell cycle (Gl, S, G2, pre-anaphase mitotic, post-anaphase mitotic), (4) area of cell 
nuclei, averaged for each of the stages of tiie ceU cycle, (5) diameter of cell nuclei, 
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averaged for each of the stages of cell cycle, (6) axes ratio (measure of elongation) of 
nuclei, averaged for each of the stages of cell cycle, (7) eccentricity of nuclei (another 
measure of elongation), averaged for each of the stages of cell cycle, (8) soUdity of 
nuclei (measure of a sh^e), averaged for each of the stages of cell cycle, (9) total 
5 intensity of nuclei pixels (a measure of amount of DNA in a cell), averaged for each 
of the stages of cell cycle, (10) variance of pixel intensities in a nucleus, averaged for 
each of the stages of cell cycle, (11) proportion of cells with normal Golgi in a 
subpopxilation of cells in each stage of the cell cycle, (12) proportion of cells with 
diffuse Golgi in a subpopulation of cells in each stage of the cell cycle, (13) 

10 proportion of cells with dispersed Golgi in a subpopulation of cells in each stage of 
the cell cycle, (14) proportion of cells with dispersed and difiRised Golgi in a 
subpopulation of cells in each stage of the ceU cycle, (15) coefiScient of the Golgi 
dispersion (kurtosis of the Golgi marker in a region of the cell), (16) coefficient of 
tubulin polymerization (average pixel intensity of object pixels in a tubulin channel), 

15 averaged across all cells in a population, and (17) coefficient of microtubule 
reorganization (standard deviation of wavelet coefficients), averaged across all cells in 
a population. 

In a more specific embodiment, the parameter set for each dose/compound 
point in a response path includes parameters derived from DNA mark^s, Golgi 
20 markers, and tubuhn markers. It is been found that the following ten parameters 
provide particularly useful phenotypic results. 

1. The size of the nuclei as derived from a DNA marker. Hiis value is 
provided as the average area of the nuclei of all interphase cells in an image. 

2. The average ellipsicityofnuclei of interphase cells. 

25 3. The difference between the proportion of interphase cells in Gi and 

the proportion of interphase cells in Gj. 

4. The proportion of interphase cells in S. 

5. The **mitotic index" which specifies the proportion of mitotic cells 

in the image, 

30 6. The proportion of interphase cells having normal Golgi. 

7. The proportion of interphase cells having diffuse Golgi. 
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8. The proportion of interphase cells having dififiise and disperse 

Golgi. 

9. The niean pixel intensity obtained from raarked tubulin. This value 
is obtained by detmiiining an overall threshold for the image and considering only 

5 objects having intensities above that threshold The mean value of intensity of all 
pixels over the threshold is used 

10. A second order wavelet obtained from aU pixels over the threshold 
value in the tubulin image. 

In a particularly preferred specific embodimrat, the above ten parameters are 
10 obtained for each of multiple cell lines, such as the six cell lioes defined above, and 
packaged in a vector. This vector then represents a single dose/compound point to be 
used in a response path for the compound of interest Note tiiat this vector spans 
multiple cell lines and multiple phenotypic features. To obtain these various features, 
three separate markers were considered. A DNA maiker was used for the first five 
15 * parameters. A Golgi marker was used for the next three parameters. Finally, a tubulin 
marker was used for flie last two parameters. 

Visualization and Comparison of Response Paths 

lists of stimuh and associated qxiantitative phenotypes may be stored as 
20 database records or other data structures that can be queried or otherwise accessed as 
part of an analysis procedure. The stimuli may also be associated with other relevant 
data such as clinical toxicity, cellular toxicity, hypersensitivity, mechanism of action, 
etc. (when available). The stored phenotypic data is used to generate and depict 
response paths. 

25 Various techniques may be employed to visualize the response paths generated 

as described above. In order for a human obs^er to make meaningful comparisons, 
the space in which the response paths are presented should be comprehendible. Note 
that for complicated quantitative phenotypes representing individual points on the 
path, there may be very many separate variables (60 in the example above). In 

30 principle, each of these variables represents a separate dimension. So one may be 
confronted with a 60 dimensional space, for example. Obviously, it becomes difficult 
to visualize meaningful trends or clusters in high dimensional space. Consider the 
problem of trying to visualize a trend in phenotypes comprised of fluree cellular 
conq)onents (e.g. tubulin, DNA and Golgi), each of which has multiple relevant 
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parameters (e.g. total quantity of DNA and variance in the concentration of DNA). 
Obviously, there are more than three relevant dimensions to be considered in 
analyzing such phenotypes. 

One possible solution to the problem involves selecting two or three 
5 dimensions (features) that are expected to be most relevant to a particular response 
curve. Unfortunately, it becomes impossible to view other potentially meaningful 
phenotypic features on the same two or three-dimensional space. 

Various techniques may be employed to address this problem. Such 
techniques create a lower dimensional space in which the individual dimensions 

10 capture two or more features of the data. Examples of such techniques include 
principle component analysis, Unear and non-linear discriminant analysis, 
multidimensional scaling, and projection pursuit techniques. A particularly preferred 
approach involves the use of principle componait analysis. Principle component 
analysis detemaines the vectors (dimensions) through which a data set shows the 

15 greatest variation in multidimensional space. The first principle component shows the 
direction of greatest variation in the data. The second principle component shows the 
direction of the second greatest variation in data and so oil One can select as many 
principle components as are suitable to depict one's data. Typically, the first one, two, 
or three principle components are selected for presenting data to human observers. 

20 Principal component analysis is described more fully in Jackson, J, E. (1991) A User 
Guide to Principal Components. New York; John Wiley and Sons; and Jolliffe, L T. 
(1986) Principal Component Analysis. New York: Springer-Verlag, both of which are 
incorporated herein by reference for all purposes. 

Various commercially available tools for performing principle component 
25 analysis are available. One suitable statistical computing package for porforming 
PCA is available fiom Insightful Corporation (formerly MathSofl) of Seattle, WA. 
Principal component analysis can be apphed to quantitative phenotypic data sets in a 
straight-forward manner. However, it will generally be necessary to standardize 
phenotypic data sets before submitting them to principle component analysis. This is 
30 because the various scalars that comprise the individual features of a quantitative 
phenotype reside on vastly different scales. For example, the mitotic index will range 
firom zero to one hundred percent, while the size of the nuclei, average ellipsicity of 
the nuclei, average pixel intensity of the tubulin marker, etc. each have very different 
scales and associated units. To bring these various features onto a comparable scale 
35 for meaningful PCA analysis, one may perform transformations to standardize the 
data. In one preferred embodiment, each of the dimensions is scaled by considering 
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all the data along tbat dimension (e.g., all values of nucleus area), subtracting the 
mean of that data and dividing by the standard deviation. This efifectively scales the 
data for standardization. 

After PCA is performed by a suitable tool, the results should be presented 
5 gr^hically. Various graphical tools are suitable for this purpose. One is pro vided by 
S+ Corporation. Another particularly useful gr^hical depiction tool is Spotfire jiet 
available from SpotFire, Inc. of Cambridge, MA. Any of these tools will not only 
present the data in principle component space, but will also identify \^ch variables 
(features) contribute the most to each of the principle components. Because the 
10 present invention is concerned primarily with paths, the graphical depiction will 
preferably show connections between individual points along liie dose response path 
for each particular stimulus. 

Meaningful comparison between related stimuli requires that one identify 
patterns or trends in various response curves. This may be accomplished wifli or 

15 without the aid of a visualization technique/tool of the type described above. If a 
human observer is to participate in the pattern recognition, then such a visualization 
tool will typically provide great assistance. However, if a machine is to do the 
comparison, the reduced dimensional visualization technique may be unnecessary. 
Examples of techniques that may be employed for such comparison include 

20 techniques that determine an average difference or distance between two potentially 
related stimulus response curves, clustering, and the like. 

Various examples applying principle component analysis to complex 
multivariate quantitative phenotypes are presented below. In these examples, dose 
response curves through multivariate phenotypes clearly show trends and clusterir^ 
25 based upon mechanism of action. 

In comparing response paths of potentially related stimuK, one typically 
identifies similarities in the general pathways. These similarities often show up in the 
path trajectories, starting and ending points, etc. In comparing stimuU such as 
exposure to chemical compounds, note that most drugs have response paths that 
30 follow reasonably similar trajectories at relatively low concentrations, but then diverge 
as the concentration increases. In other words, the phenotypic manifestations of 
distinct mechanisms of action only appear most pronounced at high concentrations. 
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Software/Hardware 

Generally, embodiments of the present invention employ various processes 
involving data stored in or transferred through one or more computer systems. 
Embodiments of the present invention also relate to an apparatus for performing these 
5 operations. This apparatus may be specially constructed for the required purposes, or 
it may be a general-purpose computer selectively activated or reconfigured by a 
computer program and/or data structure stored in the computer. The processes 
presented herein are not inherently related to any particular compute or other 
apparatus. In particular, various general-purpose machines may be used with 
10 programs written in accordance with the teachings herein, or it may be more 
convenient to construct a more specialized apparatus to perform the required method 
steps. A particular structure for a variety of these machines will appear firom the 
description given below. 

In addition, embodiments of the present invention relate to computer readable 
15 media or computer program products that include program instructions and/or data 
(including data structures) for performing various con^uter-implemented operations. 
Examples of computer-readable media include, but are not limited to, magnetic media 
such as hard disks, floppy disks, and magnetic t^e; optical media such as CD-ROM 
disks; magneto-optical media; semiconductor memory devices, and hardware devices 
20 that are specially configured to store and perform program instructions, such as read- 
only memory devices (ROM) and random access memory (RAM). The data and 
program instmctions of this invention may also be embodied on a carrier wave or 
other transport medium. Examples of program instructions include both machine 
code, such as produced by a compiler, and files containing higher level code that may 
25 be executed by ttie computer using an interpreter. 

Figure 2 illustrates a typical computer system that, when £5)propriately 
configured or designed, can serve as an image analysis apparatus of this invention. 
The computer system 200 includes any number of processors 202 (also referred to as 
central processing units, or CPUs) that are coupled to storage devices including 

30 primary storage 206 (typically a random access memory, or RAM), primary storage 
204 (typically a read only memory, or ROM). CPU 202 may be of various types 
including microcontrollers and microprocessors such as programmable devices (e.g., 
CPLDs and FPGAs) and unprogrammable devices such as gate array ASICs or general 
purpose microprocessors. As is well known in the art, primary storage 204 acts to 

35 transfer data and instructions uni-directionally to the CPU and primary storage 206 is 
used typically to transfer data and instructions in a bi-directional manner. Both of 
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these primary storage devices may include any suitable computer-readable media such 
as those described above. A mass storage device 208 is also coupled bi-directionally 
to CPU 202 and provides additional data storage capacity and may include any of the 
computer-readable media described above. Mass storage device 208 may be used to 
5 store programs, data and the like and is typically a secondary storage medium such as 
a hard disk. It will be appreciated that the information retained within the mass 
storage device 208, may, in appropriate cases, be incorporated in standard feshion as 
part of primary storage 206 as virtual memory. A specific mass storage device such as 
a CD-ROM 214 may also pass data uni-directionally to the CPU. 

10 CPU 202 is also coupled to an iuterface 210 that coimects to one or more 

input/ou^ut devices such as such as video monitors, track balls, mice, keyboards, 
microphones, touch-sensitive displays, transducer card readers, magnetic or paper tape 
readers, tablets, styluses, voice or handwriting recognizers, or other well-known input 
devices such as, of course, other computers. Finally, CPU 202 optionally may be 

1 5 coi?)led to an external device such as a database or a computer or telecommunications 
network using an external connection as shown generally at 212. With such a 
connection, it is contemplated that the CPU might receive information firom the 
network, or might output information to the network in the course of performing the 
method steps described herein. 

20 In one embodiment, the computer system 200 is directly coupled to an image 

acquisition system such as an optical imaging system that captures images of cells. 
Digital images firom the image generating system are provided via interface 212 for 
image analysis by system 200. Alternatively, the images processed by system 200 are 
provided from an image storage source such as a database or other repository of cell 

25 unages. Again, the images are provided via mterface 212. Once in the image analysis 
apparatus 200, a memory device such as primary storage 206 or mass storage 208 
buflfere or stores, at least temporarily, digital images of the cells, hi addition, the 
memory device may store the quantitative phenotypes that represent the points on the 
response path. The memory may also store various routines and/or programs for 

30 analy2ing the presenting the data, iacluding the response paths. Such 
programs/routines may include programs for perfomiing principal component 
analysis, regression analyses, path comparisons, and for graphically presenting the 
response paths. 
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Examples 

As indicated above, an underlying premise of this invention is that changes in 
cell physiology are detectable through single-cell image analysis of cellular markers 
and, by measuring such changes in a sophisticated fashion, one can monitor biological 
5 similarities and differences between compoimds. To characterize and demonstrate the 
abiUty of multivariate quantitative phenotype response paths to distinguish different 
classes of molecules, a panel of commercially available compounds wifli broad 
mechanisms of action was tested using the present invention. The types of 
compounds tested are Usted in the following table. 



Class 


Primary Protein Target 


Number of 
Compounds 


Calcium 


Endoplasmic reticulum Ca2+- 
ATPase 


3 


Calcium 


Calmodulin 


6 


Cytoskeleton 


Actin 


8 


Cytoskeleton 


Tubuhn 


9 


G Protein Effectors 


G-proteins Gj and Go 


4 


Gene Regulation 


Topoisomerase II 


6 


Ion Pump 


V-ATPase 


2 


Oxidative Phosphorylation 


Mitochondrial ATPases 


3 


Posttranslational 
Modification 


Famesyltransferase 


2 


Posttranslational 
Modification 


Geranylgeranyltransferase 1 


3 


Protein Kinase 


p38 MAP kinase 


3 


Protein ICinase 


PKC 


3 


Protein Kinase 


p34cdc2/cyclin B 


4 



A three-dimensional representation of three principal components calculated 
for each compound is shown in Figure 3A. The data points are presented within the 
first three principal components obtained for the mtire data set Note fliat ttie controls 

15 cluster tightly m the center of the graph. The specific features and cell lines used to 
constmct the quantitative phenotypes shown as points in the plot are set forth above in 
the 'Multivariate Phenotypic Data fi:om Images" and "Selecting Experiments for 
Providing Response Paths" sections. Specifically, the features are the ten features 
hsted near the end of the *Multivariate Phenotypic Data firom Images" section 

20 collected for each of the six cell lines Usted in the "Selecting Experiments for 
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Providing Response Paths" section. In Figure 3A, the connected data points link 
identical compounds at increasing concentration. Specifically, there were eight 
different concentrations for each compound. 

This dose-response path has proven to accurately classify different conqjound 
5 mechanisms. Even though many of the compound classes tested do not have a direct 
effect on the con:5)onents/markers used in this experiment, quantitative phenotype 
paths are able to detect differences in this wide array of compoimds. For example, 
there is a distinct separation of the famesj4transferase and geranylgeranyltransferase 
inhibitors fi-om each other even though both are involved in post-translational protein 
10 modificatioiL These compounds are quite different firom compounds that inhibit 
mitochondrial function (see Figure 3B). 

Similarly, actin and tubulin inhibitors show distinct dose-response trajectories 
overall even though both classes affect the cytoskeleton (see Figure 3C). 
Furthermore, compounds that effect signaling pathways directly, such as protein 
kinase inhibitors and calcium sensitizes, are uniquely diffaentiated firom the other 
classes as well (see Figure 3D). 

The demonstration that many compound classes follow unique dose-response 
trajectories validates this invention's abihty to measure the inhibition of many 
different types of targets by classifying a quantitative phenotype directly using a few 
carefially selected biological analyses. This information may be used to prioritize and 
expand marker sets, cell lines, and time-points used to generate the quantitative 
phenotypes. Further, it greatly increases the ability to resolve and differentiate ever 
more similar compound mechanisms. 

Another example demonstrates the ability to investigate biological feature data 
(of the type used in the first example) to understand the differences and similarities 
between compoimds. Despite the fact that the marker sets in this example did not 
include components of tiie actim cytoskeleton, the actin inhibitors are all uniquely 
classified by the quantitative phenotype paths. Figure 4 highlights the deviation of the 
actin mhibitor Cytochalasin A &om the other actin inhibitors. Figure 4 also shows 
that flie Cytochalasin A deviation is reliably reproduced when this experiment was run 
a second time. 

To identify which biological features were changing in Cytochalasin A as the 
concentration of compound increased, biological feature image plots (not shown) for 
all cell lines and concentrations of two actin inhibitors were compared. The most 
significant difference in these image plots is the lack of an increase in the tubulin 

30 
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features at increasing concentrations of Cytochalasin A. All of the other actin 
inhibitors showed an increase in the tubulin biological features at the same 
concentrations (data not shown). 

The change in the tubulin feature was further validated by inspection of the 
5 dose response graphs of all actin inhibitors for fliat feature across all cell lines and 
concentrations (see Figure 5A (Cytochalasin A) and Figure 5B (Cytochalasin J)) and 
inspection of a representative dose response image montages (not shown). (N.B. 18 
different montages are available for 6 cell lines by 3 replicates.) Review of the 
literature confirmed that Cytochalasin A interferes with microtubxile assembly by 
10 reacting with sulfhydryl groups. The data obtained with the present invention 
suggests that this side effect has a lower affinity than Cytochalasin A has for actin 
itselfi as evident fix>m the common PCA trajectory it shares with the rest of the actin 
inhibitors at lower concentrations. 

Another way to review the data is to ask how similar the dose response paflis 
15 are for different cell lines. These cell lines, with their different expression pattOTis, 
may exhibit increased or decreased sensitivity or off-target effects to different 
members within a single class of compounds. Because PCA analysis is only shedding 
light on the features that most distinguish compounds, an in depth cell-line sensitivity 
analysis may reveal more subtle differences and insight into molecular specificity. 

20 A way to visualize the similarities of compounds at biologically relevant 

concentration is to use hierarchical clustering. A dendrogram. in Figure 6 shows the 
compounds profiled at the IC50 for A549 cells. This visualization presents a different 
insight into compound mechanism than the PCA plot For one, the p38cdc2/cyclinB 
inhibitors, the tabuUn depolymerizers, and the G-protein activators are all highly 

25 correlated, while differences between the actin inhibitor molecules are hi^ghted 
The biological features that are different can be visualized using an image plot of the 
same compounds in cluster ordCT. This type of comparison can shed even more light 
into the biological similarities and differences between compounds at an evm more 
subtle level. 

30 The effects of cell line sensitivity axe apparent when one inspects the dose 

response gr^hs for a biological feature for all cell lines (see Figure 7A). Mastoparan, 
an amphiphic wasp poison known to activate G proteins, and its analog 
tetradecapeptide derivative, MAS7 have very similar quantitative phenotjpe dose 
response profiles, as seen in the PCA plots (see Figure 3D) and dendrogram (see 

35 Figure 6) yet quite distinct cell line sensitivities (Figure 7B). MAS7 is reported to 
have 5-fold greater potency than Mastoparan, but the dose response curves show that 
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IvlAS7 is a more potent compoimd in the cancer cell lines; it does not exhibit 
increased sensitivity in the normal cell line. Further, the increase in potency is cell 
line dependent. The graphs of Figure 7 A and 7B depict the decrease in the number of 
cells per image for each cell line as a fimction of concentration. 

5 The oncology program Cytokinetics, Inc., South San Francisco, CA used the 

quantitative phenotyping technologies of this invention in a retrospective study to 
measure cellular phenotypes and biological effects of compounds Hbst inhibit 
oncology program targets. In this effort, Cytokinetics' scientists submitted a series of 
57 primary hits and optimized compounds against Oncology targets for profiling with 
10 this invention. The quantitative phenotyping of this invention identified a number of 
attractive chemical series against a validated target with a defined cellular 
morphological change expected fiom inhibiting the target. Quantitative phenotyping 
also identified compounds that exhibited oflf-target biological effects. 

Quantitative profiles of the oncology compounds were coiiq)ared to a number 
of control compounds. Inspection of the quantitative profiles in principal component 
space revealed a significant and reproducible separation among the oncology targets, 
non-specific compounds, and controls. Notably, quantitative phenotyping 
differentiated compounds with similar biological effects, but dififerent targets. Figure 
8 shows the dose response path (obtained as described in the examples above) taken 
by 83 dififerent compounds clustered in groups representative of the mechanism of 
action. These paths connect identical compounds at increasing concentrations. Note 
that compounds that inhibit the Cytokinetics target are distinct fix)m those that affect 
other cancer targets such as topoisomerase II, tubulin stabilizers, or tubulin 
depolymerizers. 

Most of the primary hits and early analogs fi-om multiple chemical classes for 
one of the targets line up along a similar dose-response trajectory, while a few veer 
away (see Figures 9 and 10). This level of multivariate data group structurally 
unrelated compoxmds into groups with similar cellular phenotypes. Having this data 
available early in the drug discovery process allows the research scientist to know if 
structurally unrelated compounds cause a similar cellular phenotype which correlate to 
similar target selectivity, and to rapidly idratify and reject compounds which correlate 
to off-target effects. The more compounds screened witii the technologies of this 
invention, the clearer the distinctions become. 

The optimized oncology compounds superimposed on the dose-response 
trajectory of the primary hits and formed a compact cluster of data points in one 
region of the graph that excluded other compounds (see Figure 1 1). Inspection of the 
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data and images revealed that the compounds in this region caused the moiphological 
phenotype expected from inhibiting the target. 

This region was magnified to display each conq)Ound and concentration that 
caused the optimal profile (see Figure 12). There were 39 compounds at many 
5 different concentrations showing that many of the chemical analogs have a similar 
profile. All compounds in the cluster are at concentrations lower than 10 naicromolar. 
As one of the criteria for lead compound selection was potency, the number of 
confounds was further reduced to 8 by asking which compounds were able to 
produce the target specific profile at a delivered conc^tration of less than 40 
10 nanomolar. 

With the reduced data set, the biological feature data was fiirther inspected to 
gain insight into the similarities and differmces between these compounds. The 
compoimds selected had a very distinct set of biological features that would have been 
almost impossible to predict in any other way. The differences between these 
15 compounds were further evaluated on a feature-by-feature level to provide further 
insight into the compounds' mechanism. 

Figure 13 shows how a single multivariate dose response experiment was used 
to identify and narrow down 57 hits to eight highly potent and specific compounds. 
This demonstrates the breadth of information generated by present invention. 

20 As a postscript, of the eight compounds shown in this example, foxn: had been 

selected for extensive toxicity testing, and a derivative of one of those four was 
selected as a development compound. Going forward, the oncology program can use 
the profile information to select backiq) candidates for this target eithCT by exploring 
the other four compounds identified, or profiling new compoimds, and asking how 

25 similar they are to the optimal profile identified in this experiment 



Conclusion 

Although the above has generally described the present invention according to 
specific processes and apparatus, the present invention has a much broader range of 
30 applicability. La particular, the present invention has been described in terms of 
cellular phenotypes that are derived primarily firom image analj^s, but is not so 
limited. Phenotypic stimulus response curves of this invention may contain data 
obtained primarily from non-image sources. Of course, one of ordinary skill in the art 
would recognize other variations, modifications, and alternatives. 
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CLAIMS 

what is claimed is: 

5 1. A method for determining the response of cells to multiple levels of a 
stimulus, the method comprising: 

for each of the multiple levels of stimulus, obtaining a plurality of feature 
values, at least some of which characterize the phenotype of cells exposed to the 
particular level of the stimulus, to thereby produce a separate quantitative phenotype 
10 of the cells at each level of stimulus; and 

identifying a path through the separate quantitative phenotypes of cells 
exposed to the stimulus. 

2. The method of claim of 1, wherein the stimulus is selected from the groiq) 
15 consisting of exposure to a chemical compound, exposure to a biological agent, 
exposure to electromagnetic radiation, exposure to particle radiation, exposure to an 
electrical or magnetic field or force, exposure to a mechanical field or force, and 
combinations thereof. 

20 3. The method of claun of 1, wherem the stimulus is exposure to a chemical 
agent. 

4. The method of claim of 1, further comprisuag comparing the path to a different 
path produced for a different stimulus to which cells were exposed at multiple levels. 

25 

5. The method of claim of 1, wherein at least some of the feature values are 
obtained from an image of the cells. 

6. The method of claim of 1, wherein at least one of the feature values 
30 characterizes cell morphology. 

7. The method of claim of 1, wherein at least one of the feature values 
characterizes a statistical property of the cells. 

35 8, The method of claim of 1, wherein at least one of the feature values is a 
biological classification of the cells. 
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9. The method of claim of 8, wherein the biological classification specifies a cell 
cycle state of the cells. 

10. The method of claim of 1, wherem the multiple levels of stimulus are multiple 
5 concentrations of a therapeutic of potential ther^eutic. 

11. The method of claim of 1, further comprising presenting a graphical 
representation of the path- 

10 12. The method of claim of 11, wherein the gr^hical representation is provided 
along one or more principle components obtained via a principle component analysis. 

13. The method of claim of 1, further comprising providing quantitative 
phenotypes for each of multiple stimuli; and 

15 using all the quantitative phenotypes to provide a reduced-dimensionaUty 

space in which to depict the path. 

14. The method of claim 1, wherem the multiple levels of stimulus are multiple 
times after an initial exposure to the stimulus. 

20 

15. A computer program product comprising a machine readable medium on 
which is provided program instructions for determining the response of cells to 
multiple levels of a stimulus, the program instructions comprising: 

program code for obtaining, for each of the multiple levels of stimulus, a 
25 plurality of feature values, at least some of which characterize the phenotype of cells 
exposed to the particular level of the stimulus, to tiiereby produce a separate 
quantitative phenotype of the cells at each level of stimulus; and 

program code for identifying a path tiirough tiie separate quantitative 
phenotypes of cells exposed to the stimulus. 

30 

16. The computer program product of claim of 15, wherein tiie stimulus is selected 
firom the group consisting of exposure to a chemical compound, exposure to a 
biological agent, exposure to electromagnetic radiation, exposure to particle radiation, 
exposure to an electrical or magnetic field or force, exposure to a mechanical field or 

35 force, and combinations thereof. 
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17, The computer program product of claim of 15, further coii:q)iismg program 
instructions for comparing the path to a different path produced for a different 
stimulus to which cells were exposed at multiple levels. 

5 18. The computer program product of claim of 15, wherein at least some of the 
feature values are obtained from an image of the cells. 

19. The computer program product of claim of 15, wherein at least one of the 
feature values characterizes cell morphology. 

20. The computer program product of claim of 15, wherein at least one of the 
feature values characterizes a statistical property of the cells. 

21. The computer program product of claim of 15, wherein at least one of the 
feature values is a biological classification of the cells. 

* 22. ' The computer program product of claim of 15, further comprising program 
code for presenting a graphical representation of the path. 

23. The computer program product of claim of 15, further comprising: 
program code for providing quantitative phenotypes for each of multiple 

stimuli; and 

program code for using all the quantitative phenotypes to provide a reduced- 
dimensionality space in which to depict the path. 

24. The computer program product of claim 15, wherein tiie multiple levels of 
stimulus are multiple times after an initial exposure to the stimulus. 

25. An apparatus for determining the response of cells to multiple levels of a 
stimulus from images of the cells, the apparatus comprising: 

an interface configured to receive the images of the cells that have been 
exposed to said multiple levels of a stimulus; 

a memory for storing, at least temporarily, some or all of the images; and 
one or more processors in communication with the memory and designed or 
configured to (i) obtain from said images a plurality of feature values, at least some of 
which characterize the phenotype of cells exposed to the particular level of the 
stimulus, to thereby produce a separate quantitative phenotype of the cells at each 
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level of stimulxis; and (ii) identify a path through the separate quantitative phenotypes 
of cells exposed to the stimulus. 

26. The apparatus of claim of 25, wherein the stimxilus is selected from the group 
5 consisting of exposure to a chemical compound, exposure to a biological agent, 
exposure to electromagnetic radiation, exposure to particle radiation, exposure to an 
electrical or magnetic field or force, exposxire to a mechanical field or force, and 
combinations thereof 

10 27. The apparatus of claim of 25, wherein the one or more processors are fiarfher 
designed or configured to compare the path to a different path produced for a different 
stimulus to which cells were exposed at multiple levels. 

28. The apparatus of claim of 25, whorein the feature values characterizes one or 
15 more of cell morphology, a statistical propaty of the cells, and a biological 

classification of the cells. 

29. The apparatus of claim of 25, fiirther comprising a display for presenting a 
gr^hical representation of the path provided by said one or more processors. 

20 

30. The ^aratus of claim of 29, wherein the graphical representation is provided 
along one or more principle components obtained via a principle component analysis, 

* 31. A method for determining whether a first compound and a second compound 
25 act on cells by a related mechanism of action, the method comprising: 

for each of multiple concentrations of the first compound, obtaining a plurality 
of feature values characterizing the phenotype of cells exposed to the particular 
concentration of the first compound, to thereby produce a plurality of fia:st 
concentration-specific phenotypes; 
30 identifying a first path through the first concentration-specific phenotypes of 

cells exposed to the first compound; 

for each of multiple concentrations of the second compound, obtaining a 
plurality of feature values characterizing the phenotype of cells exposed to the 
particular concentration of the second compound, to thereby produce a plurality of 
35 second concentration-specific phenotypes; 

identifying a second path through second concentration-specific phenotypes of 
cells exposed to the second compound; and 
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comparing the first and second paths, wherein a degree of similarity between 
the paths corresponds to a degree of similarity in the mechanism of action of the first 
and second compounds. 

5 32. The method of claim 31, wherein at least one of the first and second 
compounds is a known therapeutic or potential therapeutic. 

33. The method of claim 31, wherein the multiple concentrations of the first 
compound vary ft^om lowest to highest by a factor of at least about two. 

10 

34. The method of claim 31, wherein the multiple concentrations of the first 
compound include at least five separate concentrations of the first compound. 

35. The method of claim 31, wherein the multiple concentrations of the first 
15 compound include at least eight separate concentrations of the first compound, 

36. The method of claim 31, wherein obtaining a plurality of feature values 
characterizing the phenotype of cells exposed to the particular concentration of the 
first compound comprises analy2dng images of a population of cells exposed to tiie 

20 particular concentration of the first compound. 

37. The method of claim 31, wherein the plurality of feature values include 
numeric values characterizing one or more of the following cellular components: 
DNA, Golgi, cytoskeletal components, and the plasma membrane. 

25 

38. The method of claim 31, wherem the plurality of feature values include 
numeric values characterizing one or more of the following cellular components: 
DNA, Golgi, and tubulin. 

30 39. The method of claim 3 1 , wherein identifying the first path comprises analyzing 
the first concentration-specific phenotypes via one or more of the following 
techniques: principal component analysis, linear and non-linear discriminant analysis, 
multidimensional scaling, and projection pursuit techniques. 

35 40. The method of claim 3 1 , wherein identifying the first path comprises analyzing 
the first concentration-specific phenotypes using principal component analysis. 
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41. The method of claim 31, wherein comparing the jfirst and second paths 
comprises graphically depicting the first and second paths together. 

42. The method of claim 41, wherein the graphical depiction presents the first and 
5 second paths in a space defined by principal components. 

43. A computer program product comprising a machine readable medimn on 
which is provided program instructions for deterinining whether a first compoimd and 
a second compound act on cells by a related mechanism of action, the program 

10 instructions comprising: 

program code for obtaining, for each of multiple concentrations of the first 
compound, a plurality of feature values characterizing the phenotype of cells exposed 
to the particular concentration of the first compound, to thereby produce a plurality of 
first concentration-specific phenotypes; 
15 program code for identifying a first path through the first concentration- 

specific phenotypes of cells exposed to the first conq)ound; 

program code for obtaining, for each of multiple concentrations of the second 
conq)ound, a plurality of feature values characterizing the phenotype of cells exposed 
to the particular concentration of the second compound, to thereby produce a plurality 
20 of second concentration-specific phenotypes; 

program code for identifying a second path through second concentration- 
specific phenotypes of cells exposed to the second compound; and 

program code for comparing the first and second paths, wherein a degree of 
similarity between the paths corresponds to a degree of similarity in the mechanism of 
25 action of the first and second compounds. 

44. The computer program product of claim 43, wherein at least one of the first 
and second compounds is a known ther^eutic or potential therapeutic. 

30 45. The computer program product of claim 43, wharein the program code for 
obtaining a plurality of feature values characterizing the phenotype of cells exposed to 
the particular concentration of flie first compoimd comprises program code for 
analyzing images of a population of cells exposed to the particular concentration of 
the first compound. 

35 

46. The computer program product of claim 43, wherein the plurality of feature 
values include numeric values characterizing one or more of the following cellular 
components: DNA, Golgi, cytoskeletal components, and the plasma membrane. 
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47. The computer program product of claim 43, wherein the program code for 
identifying the first path comprises program code for analyzing the first concentration- 
specific phenotypes via one or more of the following techniques: principal component 

5 analysis, linear and non-linear discriminant analysis, multidimensional scaling, and 
projection pursuit techniques. 

48. The computer program product of claim 43, wherein the program code for 
comparing the first and second paths comprises program code for graphically 

1 0 depicting the first and second paths together. 

49. The computer program product of claim 48, wherein the graphical depiction 
presents the first and second paths in a space defined by principal components. 



40 



wo 02/067182 PCT/US02/05553 

1/18 



103 



105 



107 



IDENTIFY COLLECTION OF 
CHEMICAL COMPOUNDS FOR 
USE IN ANALYSIS 



"I, 



101 



SELECT CURRENT COMPOUND 



SET NEW COMBINATION OF 
DOSE AND CEILUNE 



109 



I 



'\ IMAGE CELLS OF CURRENT CELL 
LINE AT CURRENT DOSE 



111 



MEASURE AND STORE VALUES 

OF MULTIPLE PHENOTYPIC 
FEATURES (AT MULTIPLE TIMES 
ff NECESSARY) 



113 




MORE COMBINATIONS OF DOSE 
AND CELL LINE? 



YES 




115 



NO 



COMBINE FEATURE VALUES ACROSS 

MULTIPLE CELL LINES TO OBTAIN 
SEPARATE PHENOTYPIC VECTORS FOR 
EACH DOSE 



117 



V 

j NO 



YES 





REDUCE DIMENSIONALITY OF FEATURE SPACE 
AND DEPICT RESPONSE PATHS FOR EACH 
COMPOUND 




121-,^ 


V 






COMPARE SEPARATE PATHS OF 
THE COMPOUNDS 





FIGURE 1 



SUBSTITUTE SHEET (RULE 26) 



wo 02/067182 



2/18 



PCT/US02/05553 



214 



200 



CD-ROM 



MASS 
STORAGE 



208 



210 

I 

INTERFACE 



PROCESSOR(S) 



202 



NETWORK 
CONNECTION 



206 



< — »• 



PRIMARY 
STORAGE (A) 



PRIMARY 
STORAGE (B) 



204 



212 



Figure 2 



SUBSTITUTE SHEET (RULE 26) 



wo 02/067182 



3/18 



PCT/US02/05553 




COtDT by TREATMENT.UOA 

B CatrnodUn 

pcortrd 

Btdopbsiric refcUtUBi 
■ca^^ATPa 

B Fama^l'ansferase 

H GeranylgBraiylransterasel 

t^UiDcharr^ATPsses 
O Oxidsfre Pmspnorytadon 



Op3^ MAP kinase 
BPKC 

a Topotsoitterase II 

■ Tid3Uiln 

av-ATPase 

Markers are connected by 
TR.QR.UID, and ordered by 
COMCEHTRATION. 

The labels showi 
TREATUENT.NAME 



Figure 3 A 



SUBSTITUTE SHEET (RULE 26) 



s 

wo 02/067182 



4/18 



PCT/US02/05553 




Legend 

Famesyl transferase (green), 
Gcranylgeranyltransferase (blue) and 
Mitochondiial inhibitors (pissk and gray) 
(See legend for Figure 3A for additional details) 
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Legend 

Actin inhibitors (recQ 

Tubulin depolymerizers (purple) 

(See legend for Figure 3 A for additional details) 
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Legend 

ER-Ca+2 ATPase inhibitors (black) 

G protein activators (blue) 

Kinase inhibitors (brown and gray) 

(See legend for Figure 3 A for additional details) 
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Red dala points are from one run 
Blue from a nm 3 weeks later 
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Primary hits 

The subset of compounds that were identified in the primary screen 
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Legend 

The region of the PCA space that represents the optimal profile for 
inhibition of a Target 



Figure 1 1 
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Legend 

Zoomed-in figure of the compoimds in PCA space in Figure 1 1 . 
Primaxy hits (red) 
Secondary analogs (black) 
Optimized hits (blue) 
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Identification of Optimal 
Profile 
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